Iterative unsupervised speaker adaptation for batch dictation

نویسندگان

  • Shigeru Homma
  • Jun-ichi Takahashi
  • Shigeki Sagayama
چکیده

This paper describes an automatic batch-style dictation paradigm in which the entire dictated speech is fully utilized for speaker adaptation and is recognized using the speaker adaptation results. The key point is that the same speech data is used both for recognition as the target and for speaker adaptation. Two steps, speech recognition and speaker adaptation which uses recognition results as means of supervision, are iterated to maximize the advantage of closeddata speaker adaptation. Recognition errors are reduced by 37% in a practical application of batch-style speech-to-text conversion of recorded dictation of Japanese medical diagnoses compared to speaker-independent recognition. To select only reliable recognition results, a supervision improvement procedure is used by which erroneous recognition results can be eliminated from the supervision. In this procedure, 59-74% of the data are extracted from the tentative recognition results and their reliability is 89-93%. This procedure also reduces recognition errors by 45%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved estimation of supervision in unsupervised speaker adaptation

Unsupervised speaker adaptation plays an important role in \batch dictation," the aim of which is to automatically transcribe large amounts of recorded dictation using speech recognition. In the case of unsupervised speaker adaptation which uses recognition results of target speech as the means of supervision, erroneous recognition results degrade the quality of the adapted acoustic models. Thi...

متن کامل

Speaker adaptation in the Philips system for large vocabulary continuous speech recognition

The combination of Maximum Likelihood Linear Regression (MLLR) with Maximum a posteriori (MAP) adaptation has been investigated for both the enrollment of a new speaker as well as for the asymptotic recognition rate after several hours of dictation. We show that a least mean square approach to MLLR is quite e ective in conjunction with phonetically derived regression classes. Results are presen...

متن کامل

Long term on-line speaker adaptation for large vocabulary dictation

On-line speaker adaptation is desirable for speech recognition dictation applications, because it o ers the possibility to improve the system with the speaker-speci c data obtained from the user. Since the user will work with such a device over a long period, for a dictation system the long term adaptation performance is more important than the adaptation speed. In contrast to speaker-dependent...

متن کامل

Unsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode

In this work the theoretical concepts of unsupervised acoustic model training and the application and evaluation of unsupervised training schemes are described. Experiments aiming at speaker adaptation via unsupervised training are conducted on the KIT lecture translator system. Evaluation takes place with respect to training e ciency and overall system performance in dependency of the availabl...

متن کامل

Unsupervised model adaptation

This paper deals with unsupervised model adaptation for speaker recognition. Two adaptation schemes are proposed, the first one is based on a test by test model adaptation and the second one proposes a batch mode, where the adaptation is performed using a set of tests before computing the decision score for each of them. The experiments are conducted thanks to the NIST SRE 2005 database. This p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996